A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples
نویسندگان
چکیده
BACKGROUND Rater agreement is important in clinical research, and Cohen's Kappa is a widely used method for assessing inter-rater reliability; however, there are well documented statistical problems associated with the measure. In order to assess its utility, we evaluated it against Gwet's AC1 and compared the results. METHODS This study was carried out across 67 patients (56% males) aged 18 to 67, with a mean SD of 44.13 ± 12.68 years. Nine raters (7 psychiatrists, a psychiatry resident and a social worker) participated as interviewers, either for the first or the second interviews, which were held 4 to 6 weeks apart. The interviews were held in order to establish a personality disorder (PD) diagnosis using DSM-IV criteria. Cohen's Kappa and Gwet's AC1 were used and the level of agreement between raters was assessed in terms of a simple categorical diagnosis (i.e., the presence or absence of a disorder). Data were also compared with a previous analysis in order to evaluate the effects of trait prevalence. RESULTS Gwet's AC1 was shown to have higher inter-rater reliability coefficients for all the PD criteria, ranging from .752 to 1.000, whereas Cohen's Kappa ranged from 0 to 1.00. Cohen's Kappa values were high and close to the percentage of agreement when the prevalence was high, whereas Gwet's AC1 values appeared not to change much with a change in prevalence, but remained close to the percentage of agreement. For example a Schizoid sample revealed a mean Cohen's Kappa of .726 and a Gwet's AC1of .853 , which fell within the different level of agreement according to criteria developed by Landis and Koch, and Altman and Fleiss. CONCLUSIONS Based on the different formulae used to calculate the level of chance-corrected agreement, Gwet's AC1 was shown to provide a more stable inter-rater reliability coefficient than Cohen's Kappa. It was also found to be less affected by prevalence and marginal probability than that of Cohen's Kappa, and therefore should be considered for use with inter-rater reliability analysis.
منابع مشابه
Computing inter-rater reliability and its variance in the presence of high agreement.
Pi (pi) and kappa (kappa) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more s...
متن کاملAssessing Reliability on Annotations (1): Theoretical Considerations
This is the first part of a two-report mini-series focussing on issues in the evaluation of annotations. In this theoretically-oriented report we lay out the relevant statistical background for reliability studies, evaluate some pertaining approaches and also sketch some arguments that may lend themselves to the development of an original statistic. A description of the project background, incl...
متن کاملPulSe DIAgnOSIS AnD ClInICAl PRACTICe
There have been few studies that evaluate the reliability of the clinical use of pulse diagnosis despite it being a fundamental part of Oriental medicine diagnostics. The objective of this study was to determine the levels of intra-rater and inter-rater reliability of practitioners using an operationally defined method, Contemporary Chinese Pulse Diagnosis (CCPD), to evaluate the radial pulse o...
متن کاملMatrix kappa: a Proposal for a Card Sort Statistic for is Survey Instrument Development
The card sort is a key scale development tool that is frequently used in IS survey instrument development. Cohen's Kappa is a recommended measure of inter-rater agreement in this process, however one of its underlying statistical assumptions is violated when it is used in open card sorts. To address this issue, Matrix Kappa is proposed as a complement to other card sort analysis techniques, ref...
متن کاملTest-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children
Objectives: Reliable and valid clinical tools to screen, diagnose, and describe eating functions and dysphagia in children are highly warranted. Today most specialists are aware of the role of assessment scales in the treatment of affected individuals. However, the problem is that the clinical tools used might be nonstandard, and worldwide, there is no integrated assessment performed to assess ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 13 شماره
صفحات -
تاریخ انتشار 2013